# Multimodal LLM
Slowfast Video Mllm Qwen2 7b Convnext 576 Frame96 S1t6
Adopts an innovative slow-fast architecture to balance temporal resolution and spatial details in video understanding, overcoming the sequence length limitations of traditional large language models.
Video-to-Text
Transformers

S
shi-labs
81
0
Slowfast Video Mllm Qwen2 7b Convnext 576 Frame64 S1t4
A video multimodal large language model using a slow-fast architecture, balancing temporal resolution and spatial details, supporting 64-frame video understanding
Video-to-Text
Transformers

S
shi-labs
184
0
Mini Ichigo Llama3.2 3B S Instruct
Apache-2.0
A multimodal language model based on the Llama-3 architecture, natively supporting audio and text input comprehension, focusing on enhancing large language models' understanding of audio.
Text-to-Audio English
M
homebrewltd
14
34
Videollm Online 8b V1plus
MIT
VideoLLM-online is a multimodal large language model based on Llama-3-8B-Instruct, focusing on online video understanding and video-text generation tasks.
Video-to-Text English
V
chenjoya
1,688
23
Featured Recommended AI Models